{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 第四章 决策树"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "## 4.1 试证明对于不含冲突数据(即特征向量完全相同但标记不同)的训练集，必存在与训练集一致(即训练误差为 0) 的决策树。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "若数据集本身无误，决策树理论上也无误。\n",
    "\n",
    "我们对某次判断作终止条件的分析：\n",
    "\n",
    "（1）当前结点包含样本全属于同一类别；\n",
    "\n",
    "（2）当前属性集为空，或所有样本在所有属性上取值相同；\n",
    "\n",
    "（3）当前结点包含样本集为空。\n",
    "\n",
    "显然，若判断终止，（2）时，若当前属性集为空则取标记多的为分类结果，无误差，若样本在所有属性上取值相同，因数据不冲突，故必为同类，等价于（1），（1）时，分类完毕，无误差，（3）回溯上一次判断的情况，必然属于（1）（2）（3）中一种。\n",
    "\n",
    "故不含冲突数据的训练集必存在无训练误差决策树。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "## 4.2 试析使用\"最小训练误差\"作为决策树划分选择准则的缺陷。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "模型泛化能力较差。类似于多元回归模型，有些变量应该被剔除，决策树也存在过拟合问题，即不仅学习了数据的泛化特征，也学习了训练集本身特点。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "## 4.3 试编程实现基于信息熵进行划分选择的决策树算法，并为表 4.3 中数据生成一棵决策树。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "参考：\n",
    "\n",
    "[机器学习(周志华) 参考答案 第四章 决策树 python重写版与画树算法](https://blog.csdn.net/icefire_tyh/article/details/54575527)\n",
    "\n",
    "[决策树(Decision Tree) | 算法实现](https://blog.csdn.net/sudden2012/article/details/82810899)\n",
    "\n",
    "[Python实现树结构](https://blog.csdn.net/m0_37324740/article/details/79435814)\n",
    "\n",
    "[Python数据结构应用6——树](https://www.cnblogs.com/bjwu/p/9016566.html)\n",
    "\n",
    "[如何实现Python多叉树](https://zhidao.baidu.com/question/1817955110065615148.html)\n",
    "\n",
    "[python-去除二维数组/二维列表中的重复行](https://blog.csdn.net/u012991043/article/details/81067207)\n",
    "\n",
    "PS:这一章就没啥公式，全在看数据结构与算法，总感觉我写的树结构怪怪的。写到一半发现df切片挺麻烦的，还是应该写一个melon类，但是懒得改了，以后再整理。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "依据信息熵增益选择最优属性那部分需要完善，当前只是根据单一准则如Gain进行最优选择，没有写启发式。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-02T14:39:59.182414Z",
     "start_time": "2020-02-02T14:39:53.786001Z"
    },
    "hidden": true
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-02T14:39:59.454633Z",
     "start_time": "2020-02-02T14:39:59.351356Z"
    },
    "hidden": true
   },
   "outputs": [],
   "source": [
    "# 提取数据\n",
    "read_path = 'D:\\JWE\\Files\\Courses\\数据挖掘\\melon_data.csv'\n",
    "melon_data = pd.read_csv(read_path)\n",
    "# melon_data.columns\n",
    "# data0 = melon_data.drop(columns=['编号'])\n",
    "data0 = melon_data[:]\n",
    "# T_midu = round(float(data0.loc[:,['密度']].median().values),4)\n",
    "# T_hantang = round(float(data0.loc[:,['含糖率']].median().values),4)\n",
    "T_midu = 0.381 # 书上取划分点0.381，0.126\n",
    "T_hantang = 0.126 \n",
    "data0.loc[data0['好瓜'] == '是',['好瓜']] = '好瓜'\n",
    "data0.loc[data0['好瓜'] == '否',['好瓜']] = '坏瓜'\n",
    "data0.loc[melon_data['密度'] > T_midu,['密度']],data0.loc[melon_data['密度'] <= T_midu,['密度']] = f'大于{T_midu}',f'小于{T_midu}'\n",
    "data0.loc[melon_data['含糖率'] > T_hantang,['含糖率']],data0.loc[melon_data['含糖率'] <= T_hantang,['含糖率']] = f'大于{T_hantang}',f'小于{T_hantang}'\n",
    "# data0\n",
    "character_dict = {character:attr for character,attr in zip(data0.columns.drop(['编号','好瓜']),[set(data0.loc[:,[each]].values.flatten()) for each in data0.columns.drop(['编号','好瓜'])])}\n",
    "# character_dict\n",
    "data1 = data0.drop(columns=['密度','含糖率'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-02T14:40:00.010106Z",
     "start_time": "2020-02-02T14:39:59.984039Z"
    },
    "hidden": true
   },
   "outputs": [],
   "source": [
    "# 未实装\n",
    "class melon(object):\n",
    "    def __init__(self,编号,色泽,根蒂,敲声,纹理,脐部,触感,密度,含糖率):\n",
    "        self.编号 = 编号\n",
    "        self.色泽 = 色泽\n",
    "        self.根蒂 = 根蒂\n",
    "        self.敲声 = 敲声\n",
    "        self.纹理 = 纹理\n",
    "        self.脐部 = 脐部\n",
    "        self.触感 = 触感\n",
    "        self.密度 = 密度\n",
    "        self.含糖率 = 含糖率\n",
    "    def __str__(self):\n",
    "        return '<' + str(self.编号) + ',' + str(self.色泽) + ',' + str(self.根蒂) + ',' + str(self.敲声)  + ',' + str(self.纹理) + ',' + str(self.脐部) + ',' + str(self.触感) + ',' + str(self.密度) + ','+ str(self.含糖率)+ '>'\n",
    "    \n",
    "def melon_case(data):\n",
    "    return [melon(编号=each[0],色泽=each[1],根蒂=each[2],敲声=each[3],纹理=each[4],脐部=each[5],触感=each[6],密度=each[7],含糖率=each[8]) for each in data.values]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-02T14:40:00.642286Z",
     "start_time": "2020-02-02T14:40:00.620228Z"
    },
    "hidden": true
   },
   "outputs": [],
   "source": [
    "# 划分指标\n",
    "characters = data0.columns\n",
    "classi = '好瓜' # classi = '好瓜'\n",
    "precision = 3\n",
    "def Ent(data,character=classi,value=None,classi=classi,precision=precision):\n",
    "    Ent = 0\n",
    "    if character == classi:\n",
    "        D = data.loc[:,[classi]]\n",
    "        for each in set(data.loc[:,[classi]].values.flatten()):\n",
    "            p = len(data.loc[data[classi]==each,[classi]].values)/len(D.values)\n",
    "            Ent += -p*np.log2(p)\n",
    "    else:\n",
    "        D = data.loc[data[character]==value,[classi]]\n",
    "        for each in set(data.loc[data[character]==value,[classi]].values.flatten()):\n",
    "            p = len(data.loc[(data[character]==value)&(data[classi]==each),[classi]].values)/len(D.values)\n",
    "            Ent += -p*np.log2(p)\n",
    "    return  round(Ent,precision)\n",
    "    \n",
    "def Gain(data,character=classi,classi=classi,precision=precision):\n",
    "    Gain = Ent(data)\n",
    "    for value in set(data.loc[:,[character]].values.flatten()):\n",
    "        D = len(data)\n",
    "        D_V = len(data.loc[(data[character]==value),[classi]])\n",
    "        Gain += -(D_V/D)*Ent(data,character,value)\n",
    "    return round(Gain,precision)\n",
    "\n",
    "def IV(data,character,classi=classi):\n",
    "    IV = 0\n",
    "    for value in set(data.loc[:,[character]].values.flatten()):    \n",
    "        D = len(data)\n",
    "        D_V = len(data.loc[(data[character]==value),[classi]])\n",
    "        IV += -(D_V/D)*np.log2(D_V/D)\n",
    "    return IV\n",
    "\n",
    "def Gain_ratio(data,character,classi=classi,precision=precision):\n",
    "    Gain_ = Gain(data,character,classi=classi)\n",
    "    IV_ = IV(data,character,classi=classi)\n",
    "    eps = 10**(-4)\n",
    "    if IV_ == 0:\n",
    "        Gain_ratio = Gain_/eps\n",
    "    else:\n",
    "        Gain_ratio = Gain_/IV_\n",
    "    return round(Gain_ratio,precision)\n",
    "\n",
    "# 基尼系数\n",
    "def Gini(data,classi=classi,precision=6):\n",
    "    Gini = 1\n",
    "    D = data.loc[:,[classi]]\n",
    "    for each in set(data.loc[:,[classi]].values.flatten()):\n",
    "        p = len(data.loc[data[classi]==each,[classi]].values)/len(D.values)\n",
    "        Gini += -p**2\n",
    "    return round(Gini,6)\n",
    "\n",
    "def Gini_index(data,character,classi=classi,precision=precision):\n",
    "    Gini_a = 0\n",
    "    for value in set(data.loc[:,[character]].values.flatten()):\n",
    "        D = data\n",
    "        D_V = data.loc[data[character]==value,:]\n",
    "        Gini_a += (len(D_V)/len(D))*Gini(D_V)    \n",
    "    return round(Gini_a,precision)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-02T14:40:00.790602Z",
     "start_time": "2020-02-02T14:40:00.646297Z"
    },
    "hidden": true
   },
   "outputs": [],
   "source": [
    "# 决策树生成\n",
    "class Node(object):\n",
    "    def __init__(self,root=None,maxcnum=10**4):\n",
    "        self.root = root\n",
    "        self.children = []\n",
    "        self.max_children_num = maxcnum\n",
    "    def __str__(self):\n",
    "        return '[' + str(self.root) + ',' + str([each.root for each in self.children]) + ']'\n",
    "\n",
    "class Tree(object):\n",
    "    def __init__(self):\n",
    "        self.root = None\n",
    "    def add(self,node):\n",
    "        if self.root is None:\n",
    "            self.root = node\n",
    "            return\n",
    "        queue = [self.root]\n",
    "        while queue:\n",
    "            cur_node = queue.pop(0)\n",
    "            if len(cur_node.children) < cur_node.max_children_num:\n",
    "                cur_node.children.append(node)\n",
    "                return\n",
    "            else:\n",
    "                for each in cur_node.children:\n",
    "                    queue.append(each)\n",
    "    def breadth_travel(self):\n",
    "        if self.root is None:\n",
    "            return\n",
    "        queue = [self.root]\n",
    "        while queue:\n",
    "            cur_node = queue.pop(0)\n",
    "            print(cur_node.root)\n",
    "            if cur_node.children:\n",
    "                for each in cur_node.children:\n",
    "                    queue.append(each)\n",
    "    def preorder(self,node):\n",
    "        if node is None:\n",
    "            return\n",
    "        print(node.root)\n",
    "        for each in node.children:\n",
    "            self.preorder(each)      \n",
    "\n",
    "def decision_class(data):\n",
    "    value_list = list(set(data.loc[:,['好瓜']].values.flatten()))\n",
    "    if len(value_list) >= 2:\n",
    "        class_dict = {value_list[0]:len(data.loc[data['好瓜']==value_list[0],['好瓜']]),value_list[1]:len(data.loc[data['好瓜']==value_list[1],['好瓜']])}\n",
    "        dclass = max(class_dict,key=class_dict.get)\n",
    "        return dclass\n",
    "    else:\n",
    "        return value_list[0]            \n",
    "\n",
    "# 深度优先生成决策树,未剪枝\n",
    "# divide_fun为划分指标，Gain,Gain_ratio,Gini_index\n",
    "def TreeGenerate(data,character_set,tree=Tree(),divide_fun=Gain):\n",
    "    # 生成node \n",
    "    dnode = Node(root={'temp':list(data.loc[:,['编号']].values.flatten())},maxcnum=1)\n",
    "    # D中样本属于同一类别C\n",
    "    if len(set(data.loc[:,['好瓜']].values.flatten())) == 1: \n",
    "        # 将node标记为C类叶结点\n",
    "        dnode.root[list(set(data.loc[:,['好瓜']].values.flatten()))[0]] = dnode.root.pop('temp')\n",
    "        tree.add(dnode)\n",
    "        return \n",
    "    # character_set为空 or data中样本在attr上取值相同\n",
    "    if len(character_set) == 0 or len(set([tuple(each) for each in data.loc[:,list(character_set)].values])) <= 1: \n",
    "        # 将node标记为叶结点，其类别标记为D中样本数最多的类\n",
    "        dnode.root[decision_class(data)] = dnode.root.pop('temp')\n",
    "        tree.add(dnode)\n",
    "        return \n",
    "    # 从A中选择最优划分属性attr_best\n",
    "    character_dict = {character:attr for character,attr in zip(character_set,[set(data.loc[:,[each]].values.flatten()) for each in character_set])}\n",
    "    gain_dict = {character:divide_fun(data,character) for character in character_set} \n",
    "    attr_best = max(gain_dict,key=gain_dict.get)\n",
    "    if divide_fun == Gini_index:\n",
    "        attr_best = min(gain_dict,key=gain_dict.get)\n",
    "    anode = Node(root=f'{attr_best}',maxcnum=len(character_dict[attr_best]))\n",
    "    tree.add(anode)\n",
    "    # 遍历最优划分属性attr_best的取值\n",
    "    for attr in character_dict[attr_best]:\n",
    "        # 为node生成一个分支\n",
    "        tnode = Node(root={f'{attr_best}={attr}':list(data.loc[data[attr_best]==attr,['编号']].values.flatten())},maxcnum=1)\n",
    "        # 令D_v表示D在attr上取值为attr_v的样本子集\n",
    "        attr_index = [i-1 for i in data.loc[data[attr_best]==attr,['编号']].values.flatten()]\n",
    "        data_v = data.loc[attr_index,:]\n",
    "        if len(data_v) == 0: # D_v为空\n",
    "            # 将分支结点标记为叶结点，其类别标记为D中样本最多的类\n",
    "            tnode.root[decision_class(data)] = tnode.root.pop(f'{attr_best}={attr}')\n",
    "            tree.add(tnode)\n",
    "            return \n",
    "        else:\n",
    "            # 以TreeGenerate(D_v,characters\\{attr_best})为分支结点\n",
    "            tree.add(tnode)\n",
    "            character_set = list(set(character_set) - {attr_best})\n",
    "            TreeGenerate(data=data_v,character_set=character_set,tree=tree,divide_fun=divide_fun)\n",
    "\n",
    "character_set0 = list(data0.columns.drop(['编号','好瓜']))\n",
    "def main(data=data0,character_set=character_set0,divide_fun=Gain):\n",
    "    tree = Tree()\n",
    "    TreeGenerate(data,character_set,tree=tree,divide_fun=divide_fun)\n",
    "    tree.breadth_travel() \n",
    "#     tree.preorder(tree.root)\n",
    "    return tree"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-02T14:40:01.173618Z",
     "start_time": "2020-02-02T14:40:01.156073Z"
    },
    "hidden": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['密度', '脐部']"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "['脐部', '含糖率', '触感', '敲声', '密度', '色泽', '纹理']"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "['根蒂', '脐部', '含糖率', '触感', '敲声', '密度', '色泽', '纹理']"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list({'根蒂','密度','脐部'} - {'根蒂'})\n",
    "list(set(character_set0) - {'根蒂'})\n",
    "list(set(character_set0) - set('根蒂'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-02T14:40:04.668913Z",
     "start_time": "2020-02-02T14:40:02.585371Z"
    },
    "hidden": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "纹理\n",
      "{'纹理=稍糊': [7, 9, 13, 14, 17]}\n",
      "触感\n",
      "{'触感=硬滑': [9, 13, 14, 17]}\n",
      "{'坏瓜': [9, 13, 14, 17]}\n",
      "{'触感=软粘': [7]}\n",
      "{'好瓜': [7]}\n",
      "{'纹理=模糊': [11, 12, 16]}\n",
      "{'坏瓜': [11, 12, 16]}\n",
      "{'纹理=清晰': [1, 2, 3, 4, 5, 6, 8, 10, 15]}\n",
      "密度\n",
      "{'密度=大于0.381': [1, 2, 3, 4, 5, 6, 8]}\n",
      "{'好瓜': [1, 2, 3, 4, 5, 6, 8]}\n",
      "{'密度=小于0.381': [10, 15]}\n",
      "{'坏瓜': [10, 15]}\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<__main__.Tree at 0x25f16ad5dd8>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "含糖率\n",
      "{'含糖率=小于0.126': [9, 11, 12, 16, 17]}\n",
      "{'坏瓜': [9, 11, 12, 16, 17]}\n",
      "{'含糖率=大于0.126': [1, 2, 3, 4, 5, 6, 7, 8, 10, 13, 14, 15]}\n",
      "密度\n",
      "{'密度=大于0.381': [1, 2, 3, 4, 5, 6, 7, 8, 13, 14]}\n",
      "纹理\n",
      "{'纹理=稍糊': [7, 13, 14]}\n",
      "脐部\n",
      "{'脐部=凹陷': [13, 14]}\n",
      "{'坏瓜': [13, 14]}\n",
      "{'脐部=稍凹': [7]}\n",
      "{'好瓜': [7]}\n",
      "{'纹理=清晰': [1, 2, 3, 4, 5, 6, 8]}\n",
      "{'好瓜': [1, 2, 3, 4, 5, 6, 8]}\n",
      "{'密度=小于0.381': [10, 15]}\n",
      "{'坏瓜': [10, 15]}\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<__main__.Tree at 0x25f16ae2550>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 标记为类别的就是叶结点\n",
    "main(divide_fun=Gain)\n",
    "main(divide_fun=Gain_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "## 4.4 试编程实现基于基尼指数进行划分选择的决策树算法，为表 4.2 中数据生成预剪枝、后剪枝决策树，并与未剪枝决策树进行比较。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "沿用上面代码"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-02T14:40:07.531523Z",
     "start_time": "2020-02-02T14:40:07.523031Z"
    },
    "hidden": true
   },
   "outputs": [],
   "source": [
    "# 基尼系数\n",
    "def Gini(data,classi=classi,precision=6):\n",
    "    Gini = 1\n",
    "    D = data.loc[:,[classi]]\n",
    "    for each in set(data.loc[:,[classi]].values.flatten()):\n",
    "        p = len(data.loc[data[classi]==each,[classi]].values)/len(D.values)\n",
    "        Gini += -p**2\n",
    "    return round(Gini,6)\n",
    "\n",
    "def Gini_index(data,character,classi=classi,precision=precision):\n",
    "    Gini_a = 0\n",
    "    for value in set(data.loc[:,[character]].values.flatten()):\n",
    "        D = data\n",
    "        D_V = data.loc[data[character]==value,:]\n",
    "        Gini_a += (len(D_V)/len(D))*Gini(D_V)    \n",
    "    return round(Gini_a,precision)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-02T14:40:08.662528Z",
     "start_time": "2020-02-02T14:40:08.112066Z"
    },
    "hidden": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "纹理\n",
      "{'纹理=稍糊': [7, 9, 13, 14, 17]}\n",
      "触感\n",
      "{'触感=硬滑': [9, 13, 14, 17]}\n",
      "{'坏瓜': [9, 13, 14, 17]}\n",
      "{'触感=软粘': [7]}\n",
      "{'好瓜': [7]}\n",
      "{'纹理=模糊': [11, 12, 16]}\n",
      "{'坏瓜': [11, 12, 16]}\n",
      "{'纹理=清晰': [1, 2, 3, 4, 5, 6, 8, 10, 15]}\n",
      "密度\n",
      "{'密度=大于0.381': [1, 2, 3, 4, 5, 6, 8]}\n",
      "{'好瓜': [1, 2, 3, 4, 5, 6, 8]}\n",
      "{'密度=小于0.381': [10, 15]}\n",
      "{'坏瓜': [10, 15]}\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<__main__.Tree at 0x25f16af92b0>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "main(divide_fun=Gini_index)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-02T14:40:08.684086Z",
     "start_time": "2020-02-02T14:40:08.666039Z"
    },
    "hidden": true
   },
   "outputs": [],
   "source": [
    "# 深度优先生成决策树,预剪枝\n",
    "# divide_fun为划分指标，Gain,Gain_ratio,Gini_index\n",
    "def TreeGenerateWithPreCut(data_train,data_test,character_set,tree=Tree(),divide_fun=Gain):\n",
    "    # 生成node \n",
    "    dnode = Node(root={'temp':list(data_train.loc[:,['编号']].values.flatten())})\n",
    "    # D中样本属于同一类别C\n",
    "    if len(set(data_train.loc[:,['好瓜']].values.flatten())) <= 1: \n",
    "        # 将node标记为C类叶结点\n",
    "        dnode.root[list(set(data_train.loc[:,['好瓜']].values.flatten()))[0]] = dnode.root.pop('temp')\n",
    "        tree.add(dnode)\n",
    "        return \n",
    "    # 从A中选择最优划分属性attr_best\n",
    "    character_dict = {character:attr for character,attr in zip(character_set,[set(data_train.loc[:,[each]].values.flatten()) for each in character_set])}\n",
    "    gain_dict = {character:divide_fun(data_train,character) for character in character_set} \n",
    "    attr_best = max(gain_dict,key=gain_dict.get)\n",
    "    if divide_fun == Gini_index:\n",
    "        attr_best = min(gain_dict,key=gain_dict.get)\n",
    "    # 计算验证集上划分前后的精度accuracy\n",
    "    dclass = decision_class(data_train)\n",
    "    pre_class_dict = {f'{attr}':decision_class(data_train.loc[data_train[attr_best]==attr,:]) for attr in character_dict[attr_best]}\n",
    "    accuracy_pre = len(data_test.loc[data_test.loc[:,'好瓜']==dclass,:])/len(data_test)\n",
    "    correct_aft = [len(data_test.loc[(data_test[attr_best]==attr)&(data_test['好瓜']==dclass),:]) for attr,dclass in zip(pre_class_dict.keys(),pre_class_dict.values())]\n",
    "    accuracy_aft = sum(correct_aft)/len(data_test)\n",
    "    # character_set为空 or data中样本在attr上取值相同 or 验证集划分前精度不小于划分后精度\n",
    "    if len(character_set) == 0 or len(set([tuple(each) for each in data_train.loc[:,list(character_set)].values])) <= 1 or accuracy_pre >= accuracy_aft: \n",
    "        # 将node标记为叶结点，其类别标记为D中样本数最多的类\n",
    "        dnode.root[decision_class(data_train)] = dnode.root.pop('temp')\n",
    "        tree.add(dnode)\n",
    "        return \n",
    "    # 选择最优划分属性attr_best为分支结点\n",
    "    anode = Node(root=f'{attr_best}')\n",
    "    tree.add(anode)\n",
    "    # 遍历最优划分属性attr_best的取值\n",
    "    for attr in character_dict[attr_best]:\n",
    "        # 为node生成一个分支\n",
    "        tnode = Node(root={f'{attr_best}={attr}':list(data_train.loc[data_train[attr_best]==attr,['编号']].values.flatten())})\n",
    "        # 令D_v表示D在attr上取值为attr_v的样本子集\n",
    "        attr_index = [i-1 for i in data_train.loc[data_train[attr_best]==attr,['编号']].values.flatten()]\n",
    "        data_v = data_train.loc[attr_index,:]\n",
    "        if len(data_v) == 0: # D_v为空\n",
    "            # 将分支结点标记为叶结点，其类别标记为D中样本最多的类\n",
    "            tnode.root[decision_class(data_train)] = tnode.root.pop(f'{attr_best}={attr}')\n",
    "            tree.add(tnode)\n",
    "            return \n",
    "        else:\n",
    "            # 以TreeGenerate(D_v,characters\\{attr})为分支结点\n",
    "            tree.add(tnode)\n",
    "            character_set1 = list(set(character_set) - {attr_best})\n",
    "            TreeGenerateWithPreCut(data_train=data_v,data_test=data_test,character_set=character_set1,tree=tree,divide_fun=divide_fun)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-02T14:40:09.188452Z",
     "start_time": "2020-02-02T14:40:09.174389Z"
    },
    "hidden": true
   },
   "outputs": [],
   "source": [
    "# data1\n",
    "train_index = [1,2,3,6,7,10,14,15,16,17]\n",
    "test_index = [4,5,8,9,11,12,13]\n",
    "data_train = data1.loc[[i-1 for i in train_index],:]\n",
    "data_test = data1.loc[[i-1 for i in test_index],:]\n",
    "character_set1 = list(data1.columns.drop(['编号','好瓜']))\n",
    "def PreCutTree(data_train=data_train,data_test=data_test,character_set=character_set1,divide_fun=Gini_index):\n",
    "    character_set1 = list(data1.columns.drop(['编号','好瓜']))\n",
    "    tree = Tree()\n",
    "    TreeGenerateWithPreCut(data_train=data_train,data_test=data_test,character_set=character_set1,tree=tree,divide_fun=divide_fun)\n",
    "    tree.breadth_travel() \n",
    "    return tree"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-02T14:40:12.321261Z",
     "start_time": "2020-02-02T14:40:10.597671Z"
    },
    "hidden": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "色泽\n",
      "{'色泽=乌黑': [2, 3, 7, 15]}\n",
      "{'好瓜': [2, 3, 7, 15]}\n",
      "{'色泽=浅白': [14, 16]}\n",
      "{'坏瓜': [14, 16]}\n",
      "{'色泽=青绿': [1, 6, 10, 17]}\n",
      "敲声\n",
      "{'敲声=清脆': [10]}\n",
      "{'坏瓜': [10]}\n",
      "{'敲声=沉闷': [17]}\n",
      "{'坏瓜': [17]}\n",
      "{'敲声=浊响': [1, 6]}\n",
      "{'好瓜': [1, 6]}\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<__main__.Tree at 0x25f16ae2b38>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "纹理\n",
      "{'纹理=稍糊': [7, 9, 13, 14, 17]}\n",
      "触感\n",
      "{'触感=硬滑': [9, 13, 14, 17]}\n",
      "{'坏瓜': [9, 13, 14, 17]}\n",
      "{'触感=软粘': [7]}\n",
      "{'好瓜': [7]}\n",
      "{'纹理=模糊': [11, 12, 16]}\n",
      "{'坏瓜': [11, 12, 16]}\n",
      "{'纹理=清晰': [1, 2, 3, 4, 5, 6, 8, 10, 15]}\n",
      "触感\n",
      "{'触感=硬滑': [1, 2, 3, 4, 5, 8]}\n",
      "{'好瓜': [1, 2, 3, 4, 5, 8]}\n",
      "{'触感=软粘': [6, 10, 15]}\n",
      "根蒂\n",
      "{'根蒂=稍蜷': [6, 15]}\n",
      "色泽\n",
      "{'色泽=乌黑': [15]}\n",
      "{'坏瓜': [15]}\n",
      "{'色泽=青绿': [6]}\n",
      "{'好瓜': [6]}\n",
      "{'根蒂=硬挺': [10]}\n",
      "{'坏瓜': [10]}\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<__main__.Tree at 0x25f16aafe10>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 预减枝与未剪枝\n",
    "divide = Gain\n",
    "PreCutTree(data_train=data_train,data_test=data_test,divide_fun=divide)\n",
    "print()\n",
    "main(data=data1,character_set=character_set1,divide_fun=Gain_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "## 4.5 试编程实现基于对率回归进行划分选择的决策树算法，并为表 4.3数据生成一棵决策树."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "沿用上面代码，日后再补。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "## 4.6 试选择 UCI 数据集，对上述 种算法所产生的未剪枝、预剪枝、后剪枝决策树进行实验比较，并进行适当的统计显著性检验。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "日后再补"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "## 4.7 图4.2 是一个递归算法，若面临巨量数据，则决策树的层数会很深，使用递归方法易导致\"栈\"溢出。试使用\"队列\"数据结构，以参数MaxDepth控制树的最大深度，写出与图 4.2 等价、但不使用递归的决策树生成算法。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-02T14:21:16.997968Z",
     "start_time": "2020-02-02T14:21:16.986904Z"
    },
    "hidden": true
   },
   "source": [
    "沿用上面代码，用队列的话应该只能写广度优先吧，但题目说与4.2等价那应该是深度优先，应该用栈。写完懒得改了，栈和队列的区别就是先进后出和先进先出。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-02T14:40:13.133421Z",
     "start_time": "2020-02-02T14:40:13.102338Z"
    },
    "hidden": true
   },
   "outputs": [],
   "source": [
    "# 广度优先生成决策树，队列控制深度\n",
    "def TreeGenerateByQueue(data,tree=Tree(),divide_fun=Gain,MaxDepth=10**4):\n",
    "    character_set = list(data.columns.drop(['编号','好瓜']))\n",
    "    # 队列\n",
    "    queue = [data]\n",
    "    # 深度\n",
    "    depth = 0\n",
    "    while queue:\n",
    "#         print(len(queue)) # 调试1\n",
    "        data = queue.pop(0)\n",
    "#         print(character_set) # 调试2\n",
    "        # 生成node \n",
    "        dnode = Node(root={'temp':list(data.loc[:,['编号']].values.flatten())}) # maxcnum=1\n",
    "        # D中样本属于同一类别C\n",
    "        if len(set(data.loc[:,['好瓜']].values.flatten())) == 1: \n",
    "            # 将node标记为C类叶结点\n",
    "            dnode.root[list(set(data.loc[:,['好瓜']].values.flatten()))[0]] = dnode.root.pop('temp')\n",
    "            tree.add(dnode)\n",
    "            continue\n",
    "#             return \n",
    "        # character_set为空 or data中样本在attr上取值相同\n",
    "        if len(character_set) == 0 or len(set([tuple(each) for each in data.loc[:,list(character_set)].values])) <= 1: \n",
    "            # 将node标记为叶结点，其类别标记为D中样本数最多的类\n",
    "            dnode.root[decision_class(data)] = dnode.root.pop('temp')\n",
    "            tree.add(dnode)\n",
    "            continue\n",
    "#             return \n",
    "        if depth == MaxDepth:\n",
    "            continue\n",
    "        # 深度加1\n",
    "        depth += 1\n",
    "#         print(depth) # 调试3\n",
    "        # 从A中选择最优划分属性attr_best\n",
    "        character_dict = {character:attr for character,attr in zip(character_set,[set(data.loc[:,[each]].values.flatten()) for each in character_set])}\n",
    "        gain_dict = {character:divide_fun(data,character) for character in character_set} \n",
    "        attr_best = max(gain_dict,key=gain_dict.get)\n",
    "        if divide_fun == Gini_index:\n",
    "            attr_best = min(gain_dict,key=gain_dict.get)\n",
    "        anode = Node(root=f'{attr_best}') # maxcnum=len(character_dict[attr_best])\n",
    "        tree.add(anode)\n",
    "        # 遍历最优划分属性attr_best的取值\n",
    "        for attr in character_dict[attr_best]:\n",
    "            # 为node生成一个分支\n",
    "            tnode = Node(root={f'{attr_best}={attr}':list(data.loc[data[attr_best]==attr,['编号']].values.flatten())}) # maxcnum=1\n",
    "            # 令D_v表示D在attr上取值为attr_v的样本子集\n",
    "            attr_index = [i-1 for i in data.loc[data[attr_best]==attr,['编号']].values.flatten()]\n",
    "            data_v = data.loc[attr_index,:]\n",
    "            if len(data_v) == 0: # D_v为空\n",
    "                # 将分支结点标记为叶结点，其类别标记为D中样本最多的类\n",
    "                tnode.root[decision_class(data)] = tnode.root.pop(f'{attr_best}={attr}')\n",
    "                tree.add(tnode)\n",
    "                continue\n",
    "#                 return \n",
    "            else:\n",
    "                # 把data_v添加到队列，分支添加到树\n",
    "                tree.add(tnode)\n",
    "                character_set = list(set(character_set) - {attr_best})\n",
    "                queue.append(data_v)\n",
    "                \n",
    "def TreeByQueue(data=data0,divide_fun=Gain,MaxDepth=10**4):    \n",
    "    tree = Tree()\n",
    "    TreeGenerateByQueue(data=data,tree=tree,divide_fun=divide_fun,MaxDepth=MaxDepth)\n",
    "    tree.breadth_travel()\n",
    "    # tree.preorder(tree.root)\n",
    "    return tree"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-02T14:40:15.028959Z",
     "start_time": "2020-02-02T14:40:13.783147Z"
    },
    "hidden": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "纹理\n",
      "{'纹理=稍糊': [7, 9, 13, 14, 17]}\n",
      "{'纹理=模糊': [11, 12, 16]}\n",
      "{'纹理=清晰': [1, 2, 3, 4, 5, 6, 8, 10, 15]}\n",
      "触感\n",
      "{'触感=硬滑': [9, 13, 14, 17]}\n",
      "{'触感=软粘': [7]}\n",
      "{'坏瓜': [11, 12, 16]}\n",
      "{'坏瓜': [9, 13, 14, 17]}\n",
      "{'好瓜': [7]}\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<__main__.Tree at 0x25f16b19908>"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "纹理\n",
      "{'纹理=稍糊': [7, 9, 13, 14, 17]}\n",
      "{'纹理=模糊': [11, 12, 16]}\n",
      "{'纹理=清晰': [1, 2, 3, 4, 5, 6, 8, 10, 15]}\n",
      "触感\n",
      "{'触感=硬滑': [9, 13, 14, 17]}\n",
      "{'触感=软粘': [7]}\n",
      "{'坏瓜': [11, 12, 16]}\n",
      "密度\n",
      "{'密度=大于0.381': [1, 2, 3, 4, 5, 6, 8]}\n",
      "{'密度=小于0.381': [10, 15]}\n",
      "{'坏瓜': [9, 13, 14, 17]}\n",
      "{'好瓜': [7]}\n",
      "{'好瓜': [1, 2, 3, 4, 5, 6, 8]}\n",
      "{'坏瓜': [10, 15]}\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<__main__.Tree at 0x25f16ae2390>"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "TreeByQueue(data=data0,divide_fun=Gain,MaxDepth=2)\n",
    "TreeByQueue(data=data0,divide_fun=Gain,MaxDepth=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "## 4.8* 试将决策树生成的深度优先搜索过程修改为广度优先搜索，以参数MaxNode控制树的最大结点数，将题 4.7 中基于队列的决策树算法进行改写。对比题 4.7 中的算法，试析哪种方式更易于控制决策树所需存储不超出内存。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "沿用上面代码"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-02T14:40:15.959932Z",
     "start_time": "2020-02-02T14:40:15.933863Z"
    },
    "hidden": true
   },
   "outputs": [],
   "source": [
    "# 广度优先生成决策树，控制结点数的话，只需要在每次增加结点时记录一次当前结点数，然后判断一次即可\n",
    "# 这里只控制了分支结点数量\n",
    "def TreeGenerateBreadthFirst(data,tree=Tree(),divide_fun=Gain,MaxNode=10**4):\n",
    "    character_set = list(data.columns.drop(['编号','好瓜']))\n",
    "    # 队列\n",
    "    queue = [data]\n",
    "    # 结点\n",
    "    node_num = 0\n",
    "    while queue:\n",
    "        data = queue.pop(0)\n",
    "        # 生成node \n",
    "        dnode = Node(root={'temp':list(data.loc[:,['编号']].values.flatten())}) # maxcnum=1\n",
    "        # D中样本属于同一类别C\n",
    "        if len(set(data.loc[:,['好瓜']].values.flatten())) == 1: \n",
    "            # 将node标记为C类叶结点\n",
    "            dnode.root[list(set(data.loc[:,['好瓜']].values.flatten()))[0]] = dnode.root.pop('temp')\n",
    "            tree.add(dnode)\n",
    "#             node_num += 1\n",
    "#             if node_num == MaxNode:\n",
    "#                 continue\n",
    "            continue\n",
    "#             return \n",
    "        # character_set为空 or data中样本在attr上取值相同\n",
    "        if len(character_set) == 0 or len(set([tuple(each) for each in data.loc[:,list(character_set)].values])) <= 1: \n",
    "            # 将node标记为叶结点，其类别标记为D中样本数最多的类\n",
    "            dnode.root[decision_class(data)] = dnode.root.pop('temp')\n",
    "            tree.add(dnode)\n",
    "#             node_num += 1\n",
    "#             if node_num == MaxNode:\n",
    "#                 continue\n",
    "            continue\n",
    "#             return \n",
    "        if node_num == MaxNode:\n",
    "            continue\n",
    "        # 从A中选择最优划分属性attr_best\n",
    "        character_dict = {character:attr for character,attr in zip(character_set,[set(data.loc[:,[each]].values.flatten()) for each in character_set])}\n",
    "        gain_dict = {character:divide_fun(data,character) for character in character_set} \n",
    "        attr_best = max(gain_dict,key=gain_dict.get)\n",
    "        if divide_fun == Gini_index:\n",
    "            attr_best = min(gain_dict,key=gain_dict.get)\n",
    "        anode = Node(root=f'{attr_best}') # maxcnum=len(character_dict[attr_best])\n",
    "        tree.add(anode)\n",
    "#         node_num += 1\n",
    "#         if node_num == MaxNode:\n",
    "#             continue\n",
    "        # 遍历最优划分属性attr_best的取值\n",
    "        for attr in character_dict[attr_best]:\n",
    "            # 为node生成一个分支\n",
    "            tnode = Node(root={f'{attr_best}={attr}':list(data.loc[data[attr_best]==attr,['编号']].values.flatten())}) # maxcnum=1\n",
    "            # 令D_v表示D在attr上取值为attr_v的样本子集\n",
    "            attr_index = [i-1 for i in data.loc[data[attr_best]==attr,['编号']].values.flatten()]\n",
    "            data_v = data.loc[attr_index,:]\n",
    "            if len(data_v) == 0: # D_v为空\n",
    "                # 将分支结点标记为叶结点，其类别标记为D中样本最多的类\n",
    "                tnode.root[decision_class(data)] = tnode.root.pop(f'{attr_best}={attr}')\n",
    "#                 node_num += 1\n",
    "#                 if node_num == MaxNode:\n",
    "#                     continue\n",
    "                tree.add(tnode)\n",
    "                continue\n",
    "#                 return \n",
    "            else:\n",
    "                # 把data_v添加到队列，分支添加到树\n",
    "                tree.add(tnode)\n",
    "                # 判断分支结点数量\n",
    "                node_num += 1\n",
    "                if node_num == MaxNode:\n",
    "                    continue\n",
    "                character_set = list(set(character_set) - {attr_best})\n",
    "                queue.append(data_v)\n",
    "                \n",
    "def TreeBreadthFirst(data=data0,divide_fun=Gain,MaxNode=10**4):    \n",
    "    tree = Tree()\n",
    "    TreeGenerateBreadthFirst(data=data,tree=tree,divide_fun=divide_fun,MaxNode=MaxNode)\n",
    "    tree.breadth_travel()\n",
    "    # tree.preorder(tree.root)\n",
    "    return tree"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-02T14:40:18.058007Z",
     "start_time": "2020-02-02T14:40:16.802671Z"
    },
    "hidden": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "纹理\n",
      "{'纹理=稍糊': [7, 9, 13, 14, 17]}\n",
      "{'纹理=模糊': [11, 12, 16]}\n",
      "{'纹理=清晰': [1, 2, 3, 4, 5, 6, 8, 10, 15]}\n",
      "{'坏瓜': [11, 12, 16]}\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<__main__.Tree at 0x25f16b458d0>"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "纹理\n",
      "{'纹理=稍糊': [7, 9, 13, 14, 17]}\n",
      "{'纹理=模糊': [11, 12, 16]}\n",
      "{'纹理=清晰': [1, 2, 3, 4, 5, 6, 8, 10, 15]}\n",
      "触感\n",
      "{'触感=硬滑': [9, 13, 14, 17]}\n",
      "{'触感=软粘': [7]}\n",
      "{'坏瓜': [11, 12, 16]}\n",
      "{'坏瓜': [9, 13, 14, 17]}\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<__main__.Tree at 0x25f16b45e80>"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "TreeBreadthFirst(data=data0,divide_fun=Gain,MaxNode=3)\n",
    "TreeBreadthFirst(data=data0,divide_fun=Gain,MaxNode=5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "## 缺失值处理和多变量决策树日后再补。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
